This Jupyter notebook documents the ETL (Extract, Transform, Load) process for preparing Inzicht Verlicht data to be used in building MIM2 data models for sharing using the Smart Data models by FIWARE. The steps outlined here will be used as a running example in creating a MIM2 creation process document, ensuring an efficient approach to data preparation for future MIM2 development.
This presentation explains Smart Data Models in detail: https://docs.google.com/presentation/d/1k021ZzRyk2PtikubOFNa3KyXeXL5nTmWWxyhs0Y0r1k/edit#slide=id.g10849e1f66c_0_499
ETL stands for Extract, Transform, Load. It's a crucial process in data management that involves:
Extracting data from its source (in this case, the Inzicht Verlicht data).
Transforming the data to meet the requirements of the target system (MIM2 data models). This involves cleaning, formatting, and manipulating the data.
Loading the transformed data into a CSV (here, the data will be used to build MIM2 data models).
By documenting the ETL process in this notebook, we gain several advantages:
Reproducibility: Anyone can follow these steps and replicate the data preparation for future MIM2 projects.
Standardization: This ensures consistency in how the Inzicht Verlicht data is prepared for MIM2 usage.
Efficiency: Having a documented process saves time and effort by avoiding the need to reinvent the wheel for each project.
Clarity: The documentation provides a clear understanding of how the data is transformed, facilitating troubleshooting and potential improvements.
The following sections of this notebook will delve into each stage of the ETL process for Inzicht Verlicht data used in MIM2 data model creation. The image below shows the different MIMs where this project is focused on MIM2 data models and sharing.
Loading in the python libraries for the ETL process.
import pandas as pd
import numpy as np
from IPython.display import Image
from datetime import datetime
import plotly.offline as pyo
pyo.init_notebook_mode()
df_vinotion_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/df_vinotion_with_seconds.csv', sep=',')
df_vinotion_full
| datetime_short | cameraId | classification | ruleId | speed | uuid | |
|---|---|---|---|---|---|---|
| 0 | 2021-11-02:09:51:39 | wss://10.252.229.12/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 54.49428 | be698d9d-35dd-4588-8f1a-149c116c3973 |
| 1 | 2021-11-02:09:51:40 | wss://10.252.229.12/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 50.97996 | 91a1416c-c3f3-4498-9dae-8d32c0ae7457 |
| 2 | 2021-11-02:09:51:41 | wss://10.252.229.12/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 55.75644 | ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78 |
| 3 | 2021-11-02:09:51:42 | wss://10.252.229.12/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 51.20208 | bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88 |
| 4 | 2021-11-02:09:51:43 | wss://10.252.229.12/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 60.90840 | c8be14e9-c24a-4542-bbc5-5e389d0a6034 |
| ... | ... | ... | ... | ... | ... | ... |
| 2370144 | 2021-11-30:22:59:51 | wss://10.252.229.14/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 38.90988 | c24f5be9-2474-4b80-b54c-766398f462ec |
| 2370145 | 2021-11-30:22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 44.01072 | 2ced7471-922d-4e35-922c-e854acf891c7 |
| 2370146 | 2021-11-30:22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 40.30704 | 7bdb8752-fd32-4cb2-b394-e7e900b95fd5 |
| 2370147 | 2021-11-30:22:59:56 | wss://10.252.229.12/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 71.63712 | 6667479f-1b76-4a9b-905c-5387edcf16fd |
| 2370148 | 2021-11-30:22:59:57 | wss://10.252.229.12/api/v2/data/main/object | Car | Application/Channel/Count/0/Processing/CountRu... | 60.88536 | 9ce75489-b1af-4d91-8810-16dddff967f1 |
2370149 rows × 6 columns
df_sorama_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/modified_data.csv', sep=',')
df_sorama_full['deviceId'] = df_sorama_full['deviceId'].astype(str)
df_sorama_full
| deviceId | Content_time | Content_value | Date | Time | Year | Month | Hour | Datetime | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 | 1900-01-01 09:49:22.310 | 2021 | 11 | 9 | 2021-11-02 09:49:22.310000-01:00 |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 | 1900-01-01 09:49:23.310 | 2021 | 11 | 9 | 2021-11-02 09:49:23.310000-01:00 |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 | 1900-01-01 09:49:24.310 | 2021 | 11 | 9 | 2021-11-02 09:49:24.310000-01:00 |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 | 1900-01-01 09:49:25.310 | 2021 | 11 | 9 | 2021-11-02 09:49:25.310000-01:00 |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 | 1900-01-01 09:49:26.310 | 2021 | 11 | 9 | 2021-11-02 09:49:26.310000-01:00 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 | 1900-01-01 09:49:06.610 | 2021 | 11 | 9 | 2021-11-02 09:49:06.610000-01:00 |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 | 1900-01-01 09:49:07.610 | 2021 | 11 | 9 | 2021-11-02 09:49:07.610000-01:00 |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 | 1900-01-01 09:49:08.610 | 2021 | 11 | 9 | 2021-11-02 09:49:08.610000-01:00 |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 | 1900-01-01 09:49:09.610 | 2021 | 11 | 9 | 2021-11-02 09:49:09.610000-01:00 |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 | 1900-01-01 09:49:10.610 | 2021 | 11 | 9 | 2021-11-02 09:49:10.610000-01:00 |
1048575 rows × 9 columns
To be able to publish the Vinotion and Sorama data to a data space, the data sets will need to be converted into the Smart Data Models format by FIWARE. These models are standardised models created by FIWARE. This conversion is MIM2 (Minimal Interoperable Mechansim 2) - data models and sharing. With this conversion, an API can then be used (MIM1) to publish the data to a data space.
The conversion of the Vinotion will use the TrafficFlowObserved SMD model : https://github.com/smart-data-models/dataModel.Transportation/blob/master/TrafficFlowObserved/doc/spec.md
The conversion of thE Sorama data will use the NoiseLevelObserved SMD model : https://github.com/smart-data-models/dataModel.Environment/blob/master/NoiseLevelObserved/doc/spec.md
Figure 2: Data Model conversion of Inzicht Verlicht Data to Smart Data Model
#Print Columns
vinotion_columns = df_vinotion_full.columns
print(vinotion_columns)
Index(['datetime_short', 'cameraId', 'classification', 'ruleId', 'speed',
'uuid'],
dtype='object')
Selecting the necessary columns from the original CSV file.
# Select specific columns
df_vinotion_SMD = df_vinotion_full[['datetime_short','cameraId', 'classification', 'speed', 'uuid']].copy()
df_vinotion_SMD
| datetime_short | cameraId | classification | speed | uuid | |
|---|---|---|---|---|---|
| 0 | 2021-11-02:09:51:39 | wss://10.252.229.12/api/v2/data/main/object | Car | 54.49428 | be698d9d-35dd-4588-8f1a-149c116c3973 |
| 1 | 2021-11-02:09:51:40 | wss://10.252.229.12/api/v2/data/main/object | Car | 50.97996 | 91a1416c-c3f3-4498-9dae-8d32c0ae7457 |
| 2 | 2021-11-02:09:51:41 | wss://10.252.229.12/api/v2/data/main/object | Car | 55.75644 | ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78 |
| 3 | 2021-11-02:09:51:42 | wss://10.252.229.12/api/v2/data/main/object | Car | 51.20208 | bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88 |
| 4 | 2021-11-02:09:51:43 | wss://10.252.229.12/api/v2/data/main/object | Car | 60.90840 | c8be14e9-c24a-4542-bbc5-5e389d0a6034 |
| ... | ... | ... | ... | ... | ... |
| 2370144 | 2021-11-30:22:59:51 | wss://10.252.229.14/api/v2/data/main/object | Car | 38.90988 | c24f5be9-2474-4b80-b54c-766398f462ec |
| 2370145 | 2021-11-30:22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | 44.01072 | 2ced7471-922d-4e35-922c-e854acf891c7 |
| 2370146 | 2021-11-30:22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | 40.30704 | 7bdb8752-fd32-4cb2-b394-e7e900b95fd5 |
| 2370147 | 2021-11-30:22:59:56 | wss://10.252.229.12/api/v2/data/main/object | Car | 71.63712 | 6667479f-1b76-4a9b-905c-5387edcf16fd |
| 2370148 | 2021-11-30:22:59:57 | wss://10.252.229.12/api/v2/data/main/object | Car | 60.88536 | 9ce75489-b1af-4d91-8810-16dddff967f1 |
2370149 rows × 5 columns
Aggregating the data to one minute intervals, since the Smart Data Models have the averageSpeed and dateObservedFrom and DateObservedTo columns. One minute intervals were used as discussed with Eindhoven Municipality and thats what they prefered.
# Convert 'dateObservedFrom' and 'dateObservedTo' columns to datetime type
df_vinotion_SMD['datetime_short'] = pd.to_datetime(df_vinotion_SMD['datetime_short'], format='%Y-%m-%d:%H:%M:%S')
df_vinotion_SMD
| datetime_short | cameraId | classification | speed | uuid | |
|---|---|---|---|---|---|
| 0 | 2021-11-02 09:51:39 | wss://10.252.229.12/api/v2/data/main/object | Car | 54.49428 | be698d9d-35dd-4588-8f1a-149c116c3973 |
| 1 | 2021-11-02 09:51:40 | wss://10.252.229.12/api/v2/data/main/object | Car | 50.97996 | 91a1416c-c3f3-4498-9dae-8d32c0ae7457 |
| 2 | 2021-11-02 09:51:41 | wss://10.252.229.12/api/v2/data/main/object | Car | 55.75644 | ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78 |
| 3 | 2021-11-02 09:51:42 | wss://10.252.229.12/api/v2/data/main/object | Car | 51.20208 | bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88 |
| 4 | 2021-11-02 09:51:43 | wss://10.252.229.12/api/v2/data/main/object | Car | 60.90840 | c8be14e9-c24a-4542-bbc5-5e389d0a6034 |
| ... | ... | ... | ... | ... | ... |
| 2370144 | 2021-11-30 22:59:51 | wss://10.252.229.14/api/v2/data/main/object | Car | 38.90988 | c24f5be9-2474-4b80-b54c-766398f462ec |
| 2370145 | 2021-11-30 22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | 44.01072 | 2ced7471-922d-4e35-922c-e854acf891c7 |
| 2370146 | 2021-11-30 22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | 40.30704 | 7bdb8752-fd32-4cb2-b394-e7e900b95fd5 |
| 2370147 | 2021-11-30 22:59:56 | wss://10.252.229.12/api/v2/data/main/object | Car | 71.63712 | 6667479f-1b76-4a9b-905c-5387edcf16fd |
| 2370148 | 2021-11-30 22:59:57 | wss://10.252.229.12/api/v2/data/main/object | Car | 60.88536 | 9ce75489-b1af-4d91-8810-16dddff967f1 |
2370149 rows × 5 columns
# Set datetime_short as the index
df_vinotion_SMD.set_index('datetime_short', inplace=True)
df_vinotion_SMD
| cameraId | classification | speed | uuid | |
|---|---|---|---|---|
| datetime_short | ||||
| 2021-11-02 09:51:39 | wss://10.252.229.12/api/v2/data/main/object | Car | 54.49428 | be698d9d-35dd-4588-8f1a-149c116c3973 |
| 2021-11-02 09:51:40 | wss://10.252.229.12/api/v2/data/main/object | Car | 50.97996 | 91a1416c-c3f3-4498-9dae-8d32c0ae7457 |
| 2021-11-02 09:51:41 | wss://10.252.229.12/api/v2/data/main/object | Car | 55.75644 | ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78 |
| 2021-11-02 09:51:42 | wss://10.252.229.12/api/v2/data/main/object | Car | 51.20208 | bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88 |
| 2021-11-02 09:51:43 | wss://10.252.229.12/api/v2/data/main/object | Car | 60.90840 | c8be14e9-c24a-4542-bbc5-5e389d0a6034 |
| ... | ... | ... | ... | ... |
| 2021-11-30 22:59:51 | wss://10.252.229.14/api/v2/data/main/object | Car | 38.90988 | c24f5be9-2474-4b80-b54c-766398f462ec |
| 2021-11-30 22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | 44.01072 | 2ced7471-922d-4e35-922c-e854acf891c7 |
| 2021-11-30 22:59:51 | wss://10.252.229.13/api/v2/data/main/object | Car | 40.30704 | 7bdb8752-fd32-4cb2-b394-e7e900b95fd5 |
| 2021-11-30 22:59:56 | wss://10.252.229.12/api/v2/data/main/object | Car | 71.63712 | 6667479f-1b76-4a9b-905c-5387edcf16fd |
| 2021-11-30 22:59:57 | wss://10.252.229.12/api/v2/data/main/object | Car | 60.88536 | 9ce75489-b1af-4d91-8810-16dddff967f1 |
2370149 rows × 4 columns
Grouping by the camera ID, and datetime_short, getting the average speed and combining the multiple values for the vehciles observed in the classification.
# Resample to one-minute intervals and aggregate by cameraId
df_vinotion_SMD = df_vinotion_SMD.groupby('cameraId').resample('1T').agg({
'classification': lambda x: list(x), # Collect all classifications into a list
'speed': 'mean', # Calculate the average speed
'uuid': lambda x: list(x) # Collect all UUIDs into a list
}).copy()
df_vinotion_SMD
| classification | speed | uuid | ||
|---|---|---|---|---|
| cameraId | datetime_short | |||
| wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... |
| 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | |
| 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | |
| 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | |
| 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | |
| ... | ... | ... | ... | ... |
| wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... |
| 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | |
| 2021-11-30 22:57:00 | [] | NaN | [] | |
| 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | |
| 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] |
205545 rows × 3 columns
Reset index to make datetime_short a column again
df_vinotion_SMD.reset_index(inplace=True)
df_vinotion_SMD
| cameraId | datetime_short | classification | speed | uuid | |
|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... |
| ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] |
205545 rows × 5 columns
In this section the columns from the Smart Data Model : TrafficFlowObeserved are added, based on which columns could be added and what columns Eindhoven Municipality wanted to have in the conversion from the Smart Data Model. The names and types for each column also match those of the Smart Data Model.
# Add a unique id column (starting from 1)
df_vinotion_SMD['id'] = range(1, len(df_vinotion_SMD) + 1)
df_vinotion_SMD
| cameraId | datetime_short | classification | speed | uuid | id | |
|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 |
| ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 |
205545 rows × 6 columns
Rename datetime_short column to dateObservedFrom.
df_vinotion_SMD.rename(columns={'datetime_short': 'dateObservedFrom'}, inplace=True)
df_vinotion_SMD
| cameraId | dateObservedFrom | classification | speed | uuid | id | |
|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 |
| ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 |
205545 rows × 6 columns
Create dateObservedTo by shifting dateObservedFrom by one minute
# Group by 'cameraId' and create 'dateObservedTo'
df_vinotion_SMD['dateObservedTo'] = df_vinotion_SMD.groupby('cameraId')['dateObservedFrom'].shift(-1)
df_vinotion_SMD
| cameraId | dateObservedFrom | classification | speed | uuid | id | dateObservedTo | |
|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT |
205545 rows × 7 columns
Adding the "type" column which is the type of Smart Data Model used, in this case its "TrafficFlowObserved".
#TrafficFlowObserved
df_vinotion_SMD['type'] = 'TrafficFlowObserved'
df_vinotion_SMD
| cameraId | dateObservedFrom | classification | speed | uuid | id | dateObservedTo | type | |
|---|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved |
205545 rows × 8 columns
Renaming more columns to match the Smart Data Models.
#renaming columns
df_vinotion_SMD = df_vinotion_SMD.rename(columns={'cameraId': 'source', 'classification':'vehicleType', 'speed':'averageVehicleSpeed','uuid':'name'})
df_vinotion_SMD
| source | dateObservedFrom | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | |
|---|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved |
205545 rows × 8 columns
Adding dateCreating and dateModeified columns to enrich the data by giving information about when the data was created and when it was modified. This is set to the current time and data, which can be modified.
# Add dateCreated and dateModified columns
df_vinotion_SMD['dateCreated'] = pd.Timestamp.now()
df_vinotion_SMD['dateModified'] = pd.Timestamp.now()
df_vinotion_SMD
| source | dateObservedFrom | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
205545 rows × 10 columns
Calculate intensity: Total number of vehicles detected during this observation period.
df_vinotion_SMD['intensity'] = df_vinotion_SMD['vehicleType'].apply(len)
df_vinotion_SMD
| source | dateObservedFrom | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 |
205545 rows × 11 columns
Adding the dataProvider column which in this case is "Vinotion".
df_vinotion_SMD['dataProvider'] = 'Vinotion'
df_vinotion_SMD
| source | dateObservedFrom | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion |
205545 rows × 12 columns
Calculating the averageHeadwayTime column which shows the average time in seconds between each vehicle. Indicating how far each vehcile is from one another. You can consider the two second rule, which is how many seconds you should be behind the vehicle in front of you.
Considering that the average is caluclated by finding the time difference between each vehicle, the average is calculated by dividing the total average time by the number of vehicles minus 1 since we want to know the average time difference between each headway and not all the vehicles (known as the fence post error).
df_vinotion_SMD['averageHeadwayTime'] = df_vinotion_SMD.groupby('source')['dateObservedFrom'].diff().dt.total_seconds().shift(-1) / (df_vinotion_SMD['intensity'] -1)
# Replace infinite values and negative values with NaN.
df_vinotion_SMD['averageHeadwayTime'].replace([np.inf, -np.inf], np.nan, inplace=True)
df_vinotion_SMD.loc[df_vinotion_SMD['averageHeadwayTime'] < 0, 'averageHeadwayTime'] = np.nan
df_vinotion_SMD
| source | dateObservedFrom | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion | 8.571429 |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion | 2.608696 |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion | 2.222222 |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion | 2.000000 |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion | 2.727273 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion | 60.000000 |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion | 30.000000 |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion | NaN |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN |
205545 rows × 13 columns
Next the "congested" boolean column needs to be made. This is done by calulating the average speed in the last hour, and the speed decrease.
# Calculate average speed of the last hour
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window=60).mean().reset_index(level=0, drop=True)
df_vinotion_SMD
| source | dateObservedFrom | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | avg_speed_last_hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion | 8.571429 | NaN |
| 1 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion | 2.608696 | NaN |
| 2 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion | 2.222222 | NaN |
| 3 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion | 2.000000 | NaN |
| 4 | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion | 2.727273 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion | 60.000000 | NaN |
| 205541 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion | 30.000000 | NaN |
| 205542 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion | NaN | NaN |
| 205543 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | NaN |
| 205544 | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | NaN |
205545 rows × 14 columns
#Grouped the data by id.
#Using rolling(window='3600s') to define a rolling window of 3600 seconds (1 hour).
#Calculating the mean (.mean()) of averageVehicleSpeed within each rolling window.
#Use .reset_index(level=0, drop=True) to remove the additional index created by the rolling operation and keep the original grouping intact.
df_vinotion_SMD.set_index('dateObservedFrom', inplace=True)
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window='3600s').mean().reset_index(level=0, drop=True)
df_vinotion_SMD.reset_index(inplace=True)
df_vinotion_SMD
| dateObservedFrom | source | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | avg_speed_last_hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-11-02 09:51:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion | 8.571429 | 45.112545 |
| 1 | 2021-11-02 09:52:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion | 2.608696 | 41.997776 |
| 2 | 2021-11-02 09:53:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion | 2.222222 | 41.193288 |
| 3 | 2021-11-02 09:54:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion | 2.000000 | 40.489828 |
| 4 | 2021-11-02 09:55:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion | 2.727273 | 40.586218 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 2021-11-30 22:55:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion | 60.000000 | 52.453460 |
| 205541 | 2021-11-30 22:56:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion | 30.000000 | 52.553246 |
| 205542 | 2021-11-30 22:57:00 | wss://10.252.229.15/api/v2/data/main/object | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion | NaN | 52.526431 |
| 205543 | 2021-11-30 22:58:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | 52.611206 |
| 205544 | 2021-11-30 22:59:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | 52.698673 |
205545 rows × 14 columns
# Calculate speed decrease compared to the average speed of the last hour
df_vinotion_SMD['speed_decrease'] = (df_vinotion_SMD['averageVehicleSpeed'] - df_vinotion_SMD['avg_speed_last_hour']) / df_vinotion_SMD['avg_speed_last_hour']
df_vinotion_SMD
| dateObservedFrom | source | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | avg_speed_last_hour | speed_decrease | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-11-02 09:51:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion | 8.571429 | 45.112545 | 0.000000 |
| 1 | 2021-11-02 09:52:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion | 2.608696 | 41.997776 | -0.074165 |
| 2 | 2021-11-02 09:53:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion | 2.222222 | 41.193288 | -0.039059 |
| 3 | 2021-11-02 09:54:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion | 2.000000 | 40.489828 | -0.052121 |
| 4 | 2021-11-02 09:55:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion | 2.727273 | 40.586218 | 0.009500 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 2021-11-30 22:55:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion | 60.000000 | 52.453460 | 0.076120 |
| 205541 | 2021-11-30 22:56:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion | 30.000000 | 52.553246 | 0.104432 |
| 205542 | 2021-11-30 22:57:00 | wss://10.252.229.15/api/v2/data/main/object | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion | NaN | 52.526431 | NaN |
| 205543 | 2021-11-30 22:58:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | 52.611206 | 0.031824 |
| 205544 | 2021-11-30 22:59:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | 52.698673 | 0.027672 |
205545 rows × 15 columns
# Identify congested periods based on the defined threshold
congested_threshold = 0.2 # 20% speed decrease
congested_duration_threshold = 5 # 5 minutes
df_vinotion_SMD['congested'] = (df_vinotion_SMD['speed_decrease'] <= -congested_threshold).rolling(window=congested_duration_threshold).sum() >= congested_duration_threshold
df_vinotion_SMD
| dateObservedFrom | source | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | avg_speed_last_hour | speed_decrease | congested | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-11-02 09:51:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion | 8.571429 | 45.112545 | 0.000000 | False |
| 1 | 2021-11-02 09:52:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion | 2.608696 | 41.997776 | -0.074165 | False |
| 2 | 2021-11-02 09:53:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion | 2.222222 | 41.193288 | -0.039059 | False |
| 3 | 2021-11-02 09:54:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion | 2.000000 | 40.489828 | -0.052121 | False |
| 4 | 2021-11-02 09:55:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion | 2.727273 | 40.586218 | 0.009500 | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 2021-11-30 22:55:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion | 60.000000 | 52.453460 | 0.076120 | False |
| 205541 | 2021-11-30 22:56:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion | 30.000000 | 52.553246 | 0.104432 | False |
| 205542 | 2021-11-30 22:57:00 | wss://10.252.229.15/api/v2/data/main/object | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion | NaN | 52.526431 | NaN | False |
| 205543 | 2021-11-30 22:58:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | 52.611206 | 0.031824 | False |
| 205544 | 2021-11-30 22:59:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | 52.698673 | 0.027672 | False |
205545 rows × 16 columns
unique_values = df_vinotion_SMD['congested'].unique()
print(unique_values)
[False True]
# Count the number of True and False values in the 'congested' column
congested_counts = df_vinotion_SMD['congested'].value_counts()
print(congested_counts)
False 205402 True 143 Name: congested, dtype: int64
In the dataset there are 143 congested rows, and 205402 that are not congested, meaning majority are not congested.
# Filter the DataFrame to display only rows where 'congested' is True
congested_true_df = df_vinotion_SMD[df_vinotion_SMD['congested'] == True]
# Display the filtered DataFrame
congested_true_df
| dateObservedFrom | source | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | avg_speed_last_hour | speed_decrease | congested | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 41495 | 2021-11-02 16:17:00 | wss://10.252.229.12/api/v2/data/main/object | [Car, Cycling, Car, Car, Car, Car, Car, Motorc... | 32.308879 | [2e43d99f-558a-4828-b432-4084c68dd52f, 3946c6f... | 41496 | 2021-11-02 16:18:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 36 | Vinotion | 1.714286 | 51.197622 | -0.368938 | True |
| 41501 | 2021-11-02 16:23:00 | wss://10.252.229.12/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 30.933765 | [8e05bf00-3258-4a9c-9b0d-855c8bb16fb9, bc59735... | 41502 | 2021-11-02 16:24:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 38 | Vinotion | 1.621622 | 49.121578 | -0.370261 | True |
| 41502 | 2021-11-02 16:24:00 | wss://10.252.229.12/api/v2/data/main/object | [Car, Motorcycling, Cycling, Car, Car, Car, Ca... | 25.126082 | [347ff0cc-de2c-47ef-a04d-546ee23ae335, 14fc51a... | 41503 | 2021-11-02 16:25:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 36 | Vinotion | 1.714286 | 48.615450 | -0.483167 | True |
| 41508 | 2021-11-02 16:30:00 | wss://10.252.229.12/api/v2/data/main/object | [Car, Cycling, Cycling, Car, Cycling, Car, Wal... | 29.997070 | [9e88f046-9b9d-416f-8aa5-e48eeb26f6dc, 76be04b... | 41509 | 2021-11-02 16:31:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 37 | Vinotion | 1.666667 | 45.732962 | -0.344082 | True |
| 41509 | 2021-11-02 16:31:00 | wss://10.252.229.12/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 18.754191 | [010ef025-89b8-489b-aba4-6d6fa21f8d83, ca15e18... | 41510 | 2021-11-02 16:32:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 34 | Vinotion | 1.818182 | 45.049178 | -0.583695 | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 180668 | 2021-11-13 16:23:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Cycling, C... | 27.225098 | [8d594d33-1d73-4d48-b693-883d3e0a7e95, 9b9097a... | 180669 | 2021-11-13 16:24:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 11 | Vinotion | 6.000000 | 42.689133 | -0.362248 | True |
| 180676 | 2021-11-13 16:31:00 | wss://10.252.229.15/api/v2/data/main/object | [Cycling, Car, Car, Car, Car, Car, Car, Car, C... | 28.946984 | [504ed4d6-7aaa-430d-a6e7-f337bffb4cf4, c50d9b6... | 180677 | 2021-11-13 16:32:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 10 | Vinotion | 6.666667 | 42.147019 | -0.313190 | True |
| 180677 | 2021-11-13 16:32:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Cycling, C... | 31.796481 | [223fb515-f7f3-4651-9829-7f198ca6f784, a985b97... | 180678 | 2021-11-13 16:33:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 19 | Vinotion | 3.333333 | 41.844839 | -0.240134 | True |
| 180683 | 2021-11-13 16:38:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car] | 29.953820 | [d6d45086-0f16-4b59-ad40-eed3d821c6a0, 6c77827... | 180684 | 2021-11-13 16:39:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 9 | Vinotion | 7.500000 | 39.294381 | -0.237707 | True |
| 180684 | 2021-11-13 16:39:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 31.171457 | [8ba05ce3-db6b-4db6-8c4a-3d267cb03f3b, e10b88a... | 180685 | 2021-11-13 16:40:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 10 | Vinotion | 6.666667 | 38.980352 | -0.200329 | True |
143 rows × 16 columns
Drop the columns 'avg_speed_last_hour' and 'speed_decrease' since they are not needed in the SMD model.
df_vinotion_SMD = df_vinotion_SMD.drop(columns=['avg_speed_last_hour', 'speed_decrease'])
df_vinotion_SMD
| dateObservedFrom | source | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | congested | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-11-02 09:51:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Truck, Car, Car, Car, Truck] | 45.112545 | [030e5136-bcb1-4162-b686-41cc63667155, 2880199... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion | 8.571429 | False |
| 1 | 2021-11-02 09:52:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.883007 | [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion | 2.608696 | False |
| 2 | 2021-11-02 09:53:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 39.584312 | [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion | 2.222222 | False |
| 3 | 2021-11-02 09:54:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 38.379448 | [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion | 2.000000 | False |
| 4 | 2021-11-02 09:55:00 | wss://10.252.229.11/api/v2/data/main/object | [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... | 40.971777 | [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion | 2.727273 | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 2021-11-30 22:55:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car] | 56.446200 | [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion | 60.000000 | False |
| 205541 | 2021-11-30 22:56:00 | wss://10.252.229.15/api/v2/data/main/object | [Car, Car, Car] | 58.041480 | [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion | 30.000000 | False |
| 205542 | 2021-11-30 22:57:00 | wss://10.252.229.15/api/v2/data/main/object | [] | NaN | [] | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion | NaN | False |
| 205543 | 2021-11-30 22:58:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.285480 | [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | False |
| 205544 | 2021-11-30 22:59:00 | wss://10.252.229.15/api/v2/data/main/object | [Car] | 54.156960 | [522263fa-6df0-4280-afa6-365ffc0a552b] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | False |
205545 rows × 14 columns
Converting columns to strings and replacing empty lists with NaN values.
# Convert the 'vehicleType' column to string
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].astype(str)
# Convert the 'name' column to string
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].astype(str)
# Replace empty lists of 'vehicleType' with NaN
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].replace('[]', np.nan)
# Replace empty string of 'vehicleType' with NaN
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].replace('[]', np.nan)
df_vinotion_SMD
| dateObservedFrom | source | vehicleType | averageVehicleSpeed | name | id | dateObservedTo | type | dateCreated | dateModified | intensity | dataProvider | averageHeadwayTime | congested | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-11-02 09:51:00 | wss://10.252.229.11/api/v2/data/main/object | ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... | 45.112545 | ['030e5136-bcb1-4162-b686-41cc63667155', '2880... | 1 | 2021-11-02 09:52:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 8 | Vinotion | 8.571429 | False |
| 1 | 2021-11-02 09:52:00 | wss://10.252.229.11/api/v2/data/main/object | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.883007 | ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... | 2 | 2021-11-02 09:53:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 24 | Vinotion | 2.608696 | False |
| 2 | 2021-11-02 09:53:00 | wss://10.252.229.11/api/v2/data/main/object | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 39.584312 | ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... | 3 | 2021-11-02 09:54:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 28 | Vinotion | 2.222222 | False |
| 3 | 2021-11-02 09:54:00 | wss://10.252.229.11/api/v2/data/main/object | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.379448 | ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... | 4 | 2021-11-02 09:55:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 31 | Vinotion | 2.000000 | False |
| 4 | 2021-11-02 09:55:00 | wss://10.252.229.11/api/v2/data/main/object | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 40.971777 | ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... | 5 | 2021-11-02 09:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 23 | Vinotion | 2.727273 | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 2021-11-30 22:55:00 | wss://10.252.229.15/api/v2/data/main/object | ['Car', 'Car'] | 56.446200 | ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... | 205541 | 2021-11-30 22:56:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 2 | Vinotion | 60.000000 | False |
| 205541 | 2021-11-30 22:56:00 | wss://10.252.229.15/api/v2/data/main/object | ['Car', 'Car', 'Car'] | 58.041480 | ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... | 205542 | 2021-11-30 22:57:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 3 | Vinotion | 30.000000 | False |
| 205542 | 2021-11-30 22:57:00 | wss://10.252.229.15/api/v2/data/main/object | NaN | NaN | NaN | 205543 | 2021-11-30 22:58:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 0 | Vinotion | NaN | False |
| 205543 | 2021-11-30 22:58:00 | wss://10.252.229.15/api/v2/data/main/object | ['Car'] | 54.285480 | ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] | 205544 | 2021-11-30 22:59:00 | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | False |
| 205544 | 2021-11-30 22:59:00 | wss://10.252.229.15/api/v2/data/main/object | ['Car'] | 54.156960 | ['522263fa-6df0-4280-afa6-365ffc0a552b'] | 205545 | NaT | TrafficFlowObserved | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 | 1 | Vinotion | NaN | False |
205545 rows × 14 columns
Re-ordering columns to make it more readable.
#re-ordering columns
df_vinotion_SMD = df_vinotion_SMD[['id','dataProvider','type','name','source','dateObservedFrom','dateObservedTo','vehicleType',
'averageVehicleSpeed','intensity','averageHeadwayTime','congested','dateCreated','dateModified']]
df_vinotion_SMD
| id | dataProvider | type | name | source | dateObservedFrom | dateObservedTo | vehicleType | averageVehicleSpeed | intensity | averageHeadwayTime | congested | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Vinotion | TrafficFlowObserved | ['030e5136-bcb1-4162-b686-41cc63667155', '2880... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | 2021-11-02 09:52:00 | ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... | 45.112545 | 8 | 8.571429 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 1 | 2 | Vinotion | TrafficFlowObserved | ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | 2021-11-02 09:53:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.883007 | 24 | 2.608696 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 2 | 3 | Vinotion | TrafficFlowObserved | ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | 2021-11-02 09:54:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 39.584312 | 28 | 2.222222 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 3 | 4 | Vinotion | TrafficFlowObserved | ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | 2021-11-02 09:55:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.379448 | 31 | 2.000000 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 4 | 5 | Vinotion | TrafficFlowObserved | ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | 2021-11-02 09:56:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 40.971777 | 23 | 2.727273 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 205541 | Vinotion | TrafficFlowObserved | ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | 2021-11-30 22:56:00 | ['Car', 'Car'] | 56.446200 | 2 | 60.000000 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205541 | 205542 | Vinotion | TrafficFlowObserved | ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | 2021-11-30 22:57:00 | ['Car', 'Car', 'Car'] | 58.041480 | 3 | 30.000000 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205542 | 205543 | Vinotion | TrafficFlowObserved | NaN | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | 2021-11-30 22:58:00 | NaN | NaN | 0 | NaN | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205543 | 205544 | Vinotion | TrafficFlowObserved | ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | 2021-11-30 22:59:00 | ['Car'] | 54.285480 | 1 | NaN | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205544 | 205545 | Vinotion | TrafficFlowObserved | ['522263fa-6df0-4280-afa6-365ffc0a552b'] | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | NaT | ['Car'] | 54.156960 | 1 | NaN | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
205545 rows × 14 columns
df_vinotion_SMD
| id | dataProvider | type | name | source | dateObservedFrom | dateObservedTo | vehicleType | averageVehicleSpeed | intensity | averageHeadwayTime | congested | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Vinotion | TrafficFlowObserved | ['030e5136-bcb1-4162-b686-41cc63667155', '2880... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | 2021-11-02 09:52:00 | ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... | 45.112545 | 8 | 8.571429 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 1 | 2 | Vinotion | TrafficFlowObserved | ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | 2021-11-02 09:53:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.883007 | 24 | 2.608696 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 2 | 3 | Vinotion | TrafficFlowObserved | ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | 2021-11-02 09:54:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 39.584312 | 28 | 2.222222 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 3 | 4 | Vinotion | TrafficFlowObserved | ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | 2021-11-02 09:55:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.379448 | 31 | 2.000000 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 4 | 5 | Vinotion | TrafficFlowObserved | ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | 2021-11-02 09:56:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 40.971777 | 23 | 2.727273 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 205541 | Vinotion | TrafficFlowObserved | ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | 2021-11-30 22:56:00 | ['Car', 'Car'] | 56.446200 | 2 | 60.000000 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205541 | 205542 | Vinotion | TrafficFlowObserved | ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | 2021-11-30 22:57:00 | ['Car', 'Car', 'Car'] | 58.041480 | 3 | 30.000000 | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205542 | 205543 | Vinotion | TrafficFlowObserved | NaN | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | 2021-11-30 22:58:00 | NaN | NaN | 0 | NaN | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205543 | 205544 | Vinotion | TrafficFlowObserved | ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | 2021-11-30 22:59:00 | ['Car'] | 54.285480 | 1 | NaN | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
| 205544 | 205545 | Vinotion | TrafficFlowObserved | ['522263fa-6df0-4280-afa6-365ffc0a552b'] | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | NaT | ['Car'] | 54.156960 | 1 | NaN | False | 2024-07-11 12:45:57.473436 | 2024-07-11 12:45:57.475993 |
205545 rows × 14 columns
The data frame table above shows the end result of how the data that will be published for Vinotion would look like.
df_vinotion_SMD.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 205545 entries, 0 to 205544 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 205545 non-null int64 1 dataProvider 205545 non-null object 2 type 205545 non-null object 3 name 176949 non-null object 4 source 205545 non-null object 5 dateObservedFrom 205545 non-null datetime64[ns] 6 dateObservedTo 205540 non-null datetime64[ns] 7 vehicleType 176949 non-null object 8 averageVehicleSpeed 176949 non-null float64 9 intensity 205545 non-null int64 10 averageHeadwayTime 158418 non-null float64 11 congested 205545 non-null bool 12 dateCreated 205545 non-null datetime64[ns] 13 dateModified 205545 non-null datetime64[ns] dtypes: bool(1), datetime64[ns](4), float64(2), int64(2), object(5) memory usage: 20.6+ MB
Next is to convert the Sorama to the NoiseLevelObserved Smart Data Model. The data is aggregated inheritly to one second durations, therefore it is not required to aggregate the data more, although it can be aggregated to larger intervals, such as one minute intervals.
df_sorama_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/modified_data.csv', sep=',')
df_sorama_full
| deviceId | Content_time | Content_value | Date | Time | Year | Month | Hour | Datetime | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 | 1900-01-01 09:49:22.310 | 2021 | 11 | 9 | 2021-11-02 09:49:22.310000-01:00 |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 | 1900-01-01 09:49:23.310 | 2021 | 11 | 9 | 2021-11-02 09:49:23.310000-01:00 |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 | 1900-01-01 09:49:24.310 | 2021 | 11 | 9 | 2021-11-02 09:49:24.310000-01:00 |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 | 1900-01-01 09:49:25.310 | 2021 | 11 | 9 | 2021-11-02 09:49:25.310000-01:00 |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 | 1900-01-01 09:49:26.310 | 2021 | 11 | 9 | 2021-11-02 09:49:26.310000-01:00 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 | 1900-01-01 09:49:06.610 | 2021 | 11 | 9 | 2021-11-02 09:49:06.610000-01:00 |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 | 1900-01-01 09:49:07.610 | 2021 | 11 | 9 | 2021-11-02 09:49:07.610000-01:00 |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 | 1900-01-01 09:49:08.610 | 2021 | 11 | 9 | 2021-11-02 09:49:08.610000-01:00 |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 | 1900-01-01 09:49:09.610 | 2021 | 11 | 9 | 2021-11-02 09:49:09.610000-01:00 |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 | 1900-01-01 09:49:10.610 | 2021 | 11 | 9 | 2021-11-02 09:49:10.610000-01:00 |
1048575 rows × 9 columns
Select specific columns and create new DataFrame.
df_sorama_SMD = df_sorama_full[['deviceId', 'Content_time', 'Content_value']].copy()
df_sorama_SMD
| deviceId | Content_time | Content_value | |
|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 |
| ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 |
1048575 rows × 3 columns
Renaming datetime_short to dateObservedTo
df_sorama_SMD['dateObservedTo'] = df_sorama_SMD['Content_time'].copy()
df_sorama_SMD
| deviceId | Content_time | Content_value | dateObservedTo | |
|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 09:49:22.310 |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 09:49:23.310 |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 09:49:24.310 |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 09:49:25.310 |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 09:49:26.310 |
| ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 09:49:06.610 |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 09:49:07.610 |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 09:49:08.610 |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 09:49:09.610 |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 09:49:10.610 |
1048575 rows × 4 columns
Adding the "type" column which is "NoiseLevelObserved" Smart Data Model.
#TrafficFlowObserved
df_sorama_SMD['type'] = 'NoiseLevelObserved'
df_sorama_SMD
| deviceId | Content_time | Content_value | dateObservedTo | type | |
|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 09:49:22.310 | NoiseLevelObserved |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 09:49:23.310 | NoiseLevelObserved |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 09:49:24.310 | NoiseLevelObserved |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 09:49:25.310 | NoiseLevelObserved |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 09:49:26.310 | NoiseLevelObserved |
| ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 09:49:06.610 | NoiseLevelObserved |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 09:49:07.610 | NoiseLevelObserved |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 09:49:08.610 | NoiseLevelObserved |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 09:49:09.610 | NoiseLevelObserved |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 09:49:10.610 | NoiseLevelObserved |
1048575 rows × 5 columns
Renaming columns to match the Smart Data Model.
df_sorama_SMD = df_sorama_SMD.rename(columns={'deviceId': 'refDevice', 'Content_time': 'dateObservedFrom', 'Content_value':'LAeq'})
df_sorama_SMD
| refDevice | dateObservedFrom | LAeq | dateObservedTo | type | |
|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 09:49:22.310 | NoiseLevelObserved |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 09:49:23.310 | NoiseLevelObserved |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 09:49:24.310 | NoiseLevelObserved |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 09:49:25.310 | NoiseLevelObserved |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 09:49:26.310 | NoiseLevelObserved |
| ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 09:49:06.610 | NoiseLevelObserved |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 09:49:07.610 | NoiseLevelObserved |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 09:49:08.610 | NoiseLevelObserved |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 09:49:09.610 | NoiseLevelObserved |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 09:49:10.610 | NoiseLevelObserved |
1048575 rows × 5 columns
# Add dateCreated and dateModified columns
df_sorama_SMD['dateCreated'] = pd.Timestamp.now()
df_sorama_SMD['dateModified'] = pd.Timestamp.now()
df_sorama_SMD
| refDevice | dateObservedFrom | LAeq | dateObservedTo | type | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 09:49:22.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 09:49:23.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 09:49:24.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 09:49:25.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 09:49:26.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 09:49:06.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 09:49:07.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 09:49:08.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 09:49:09.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 09:49:10.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
1048575 rows × 7 columns
Adding dataProvider column which is "Sorama".
# Add dataProvider column
df_sorama_SMD['dataProvider'] = 'Sorama'
df_sorama_SMD
| refDevice | dateObservedFrom | LAeq | dateObservedTo | type | dateCreated | dateModified | dataProvider | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 09:49:22.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 09:49:23.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 09:49:24.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 09:49:25.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 09:49:26.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 09:49:06.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 09:49:07.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 09:49:08.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 09:49:09.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 09:49:10.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama |
1048575 rows × 8 columns
Addiing an id column as a primary key column, this can be changed to be more specific, in this case it's numbered from 1 till the end.
# Add a unique id column (starting from 1)
df_sorama_SMD['id'] = range(1, len(df_sorama_SMD) + 1)
df_sorama_SMD
| refDevice | dateObservedFrom | LAeq | dateObservedTo | type | dateCreated | dateModified | dataProvider | id | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 09:49:22.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1 |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 09:49:23.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 2 |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 09:49:24.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 3 |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 09:49:25.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 4 |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 09:49:26.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 09:49:06.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048571 |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 09:49:07.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048572 |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 09:49:08.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048573 |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 09:49:09.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048574 |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 09:49:10.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048575 |
1048575 rows × 9 columns
The locations are required for the Sorama sound sensors. To get the locations the image below is used with google maps to manually aquire the locations.
Adding the location column with the coordinates from the image above and google maps. These co-ordinates is from google maps and maybe incorrect, and is currently only for one device (listener 5 or 7 or 8 ,not sure) since the DeviceID is rounded off in the CSV file. I checked to see if I could revert it or if excel was doing it but it seems to not be the case.
coordinates = "5.481549426062068, 51.452869111964304"
df_sorama_SMD['location'] = [coordinates] * len(df_sorama_SMD)
df_sorama_SMD
| refDevice | dateObservedFrom | LAeq | dateObservedTo | type | dateCreated | dateModified | dataProvider | id | location | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2,03804E+18 | 2021-11-02 09:49:22.310 | 57.077148 | 2021-11-02 09:49:22.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1 | 5.481549426062068, 51.452869111964304 |
| 1 | 2,03804E+18 | 2021-11-02 09:49:23.310 | 57.639920 | 2021-11-02 09:49:23.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 2 | 5.481549426062068, 51.452869111964304 |
| 2 | 2,03804E+18 | 2021-11-02 09:49:24.310 | 58.501630 | 2021-11-02 09:49:24.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 3 | 5.481549426062068, 51.452869111964304 |
| 3 | 2,03804E+18 | 2021-11-02 09:49:25.310 | 57.684305 | 2021-11-02 09:49:25.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 4 | 5.481549426062068, 51.452869111964304 |
| 4 | 2,03804E+18 | 2021-11-02 09:49:26.310 | 57.416267 | 2021-11-02 09:49:26.310 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 5 | 5.481549426062068, 51.452869111964304 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 2,03804E+18 | 2021-11-02 09:49:06.610 | 58.866556 | 2021-11-02 09:49:06.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048571 | 5.481549426062068, 51.452869111964304 |
| 1048571 | 2,03804E+18 | 2021-11-02 09:49:07.610 | 57.896402 | 2021-11-02 09:49:07.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048572 | 5.481549426062068, 51.452869111964304 |
| 1048572 | 2,03804E+18 | 2021-11-02 09:49:08.610 | 58.556143 | 2021-11-02 09:49:08.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048573 | 5.481549426062068, 51.452869111964304 |
| 1048573 | 2,03804E+18 | 2021-11-02 09:49:09.610 | 58.915390 | 2021-11-02 09:49:09.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048574 | 5.481549426062068, 51.452869111964304 |
| 1048574 | 2,03804E+18 | 2021-11-02 09:49:10.610 | 59.383412 | 2021-11-02 09:49:10.610 | NoiseLevelObserved | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 | Sorama | 1048575 | 5.481549426062068, 51.452869111964304 |
1048575 rows × 10 columns
df_sorama_SMD.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1048575 entries, 0 to 1048574 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 refDevice 1048575 non-null object 1 dateObservedFrom 1048575 non-null object 2 LAeq 1048575 non-null float64 3 dateObservedTo 1048575 non-null object 4 type 1048575 non-null object 5 dateCreated 1048575 non-null datetime64[ns] 6 dateModified 1048575 non-null datetime64[ns] 7 dataProvider 1048575 non-null object 8 id 1048575 non-null int64 9 location 1048575 non-null object dtypes: datetime64[ns](2), float64(1), int64(1), object(6) memory usage: 80.0+ MB
Converting columns types to match Smart Data Model types.
# Convert dateObservedFrom and dateObservedTo to datetime
df_sorama_SMD['dateObservedFrom'] = pd.to_datetime(df_sorama_SMD['dateObservedFrom'], errors='coerce')
df_sorama_SMD['dateObservedTo'] = pd.to_datetime(df_sorama_SMD['dateObservedTo'], errors='coerce')
Convert type to string.
df_sorama_SMD['type'] = df_sorama_SMD['type'].astype(str)
Convert refDevice to string.
df_sorama_SMD['refDevice'] = df_sorama_SMD['refDevice'].astype(str)
Convert LAeq to numeric (float).
df_sorama_SMD['LAeq'] = pd.to_numeric(df_sorama_SMD['LAeq'], errors='coerce')
Convert dataProvider to string.
df_sorama_SMD['dataProvider'] = df_sorama_SMD['dataProvider'].astype(str)
Re-ordering columns to make the data more readable.
df_sorama_SMD = df_sorama_SMD[['id','dataProvider','type','refDevice', 'dateObservedFrom', 'dateObservedTo', 'LAeq','location','dateCreated','dateModified']]
df_sorama_SMD
| id | dataProvider | type | refDevice | dateObservedFrom | dateObservedTo | LAeq | location | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:22.310 | 2021-11-02 09:49:22.310 | 57.077148 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1 | 2 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:23.310 | 2021-11-02 09:49:23.310 | 57.639920 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 2 | 3 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:24.310 | 2021-11-02 09:49:24.310 | 58.501630 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 3 | 4 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:25.310 | 2021-11-02 09:49:25.310 | 57.684305 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 4 | 5 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:26.310 | 2021-11-02 09:49:26.310 | 57.416267 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 1048571 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:06.610 | 2021-11-02 09:49:06.610 | 58.866556 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048571 | 1048572 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:07.610 | 2021-11-02 09:49:07.610 | 57.896402 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048572 | 1048573 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:08.610 | 2021-11-02 09:49:08.610 | 58.556143 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048573 | 1048574 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:09.610 | 2021-11-02 09:49:09.610 | 58.915390 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048574 | 1048575 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:10.610 | 2021-11-02 09:49:10.610 | 59.383412 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
1048575 rows × 10 columns
df_sorama_SMD
| id | dataProvider | type | refDevice | dateObservedFrom | dateObservedTo | LAeq | location | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:22.310 | 2021-11-02 09:49:22.310 | 57.077148 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1 | 2 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:23.310 | 2021-11-02 09:49:23.310 | 57.639920 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 2 | 3 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:24.310 | 2021-11-02 09:49:24.310 | 58.501630 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 3 | 4 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:25.310 | 2021-11-02 09:49:25.310 | 57.684305 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 4 | 5 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:26.310 | 2021-11-02 09:49:26.310 | 57.416267 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 1048571 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:06.610 | 2021-11-02 09:49:06.610 | 58.866556 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048571 | 1048572 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:07.610 | 2021-11-02 09:49:07.610 | 57.896402 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048572 | 1048573 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:08.610 | 2021-11-02 09:49:08.610 | 58.556143 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048573 | 1048574 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:09.610 | 2021-11-02 09:49:09.610 | 58.915390 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
| 1048574 | 1048575 | Sorama | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:10.610 | 2021-11-02 09:49:10.610 | 59.383412 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:03.369713 | 2024-07-11 12:46:03.371162 |
1048575 rows × 10 columns
The table above shows the end result of the conversion for the Sorama data. With the columns types listed below
df_sorama_SMD.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1048575 entries, 0 to 1048574 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 1048575 non-null int64 1 dataProvider 1048575 non-null object 2 type 1048575 non-null object 3 refDevice 1048575 non-null object 4 dateObservedFrom 1048575 non-null datetime64[ns] 5 dateObservedTo 1048575 non-null datetime64[ns] 6 LAeq 1048575 non-null float64 7 location 1048575 non-null object 8 dateCreated 1048575 non-null datetime64[ns] 9 dateModified 1048575 non-null datetime64[ns] dtypes: datetime64[ns](4), float64(1), int64(1), object(4) memory usage: 80.0+ MB
Genertating Scripts to make all the steps made in the conversion in one single script. This script can be used in a live connection in the data pipeline for uploading the data to a Smart Data Model. Right now the input is the CSV files and output also CSV files and JSON files.
import pandas as pd
import numpy as np
df_vinotion_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/df_vinotion_with_seconds.csv', sep=',')
# Select specific columns
df_vinotion_SMD = df_vinotion_full[['datetime_short','cameraId', 'classification', 'speed', 'uuid']].copy()
# Convert 'dateObservedFrom' and 'dateObservedTo' columns to datetime type
df_vinotion_SMD['datetime_short'] = pd.to_datetime(df_vinotion_SMD['datetime_short'], format='%Y-%m-%d:%H:%M:%S')
# Set datetime_short as the index
df_vinotion_SMD.set_index('datetime_short', inplace=True)
# Resample to one-minute intervals and aggregate by cameraId
df_vinotion_SMD = df_vinotion_SMD.groupby('cameraId').resample('1T').agg({
'classification': lambda x: list(x), # Collect all classifications into a list
'speed': 'mean', # Calculate the average speed
'uuid': lambda x: list(x) # Collect all UUIDs into a list
}).copy()
# Reset index to make datetime_short a column again
df_vinotion_SMD.reset_index(inplace=True)
# Add a unique id column (starting from 1)
df_vinotion_SMD['id'] = range(1, len(df_vinotion_SMD) + 1)
# Rename datetime_short column to dateObservedFrom
df_vinotion_SMD.rename(columns={'datetime_short': 'dateObservedFrom'}, inplace=True)
# Create dateObservedTo by shifting dateObservedFrom by one minute and grouping by cameraID
df_vinotion_SMD['dateObservedTo'] = df_vinotion_SMD.groupby('cameraId')['dateObservedFrom'].shift(-1)
#TrafficFlowObserved
df_vinotion_SMD['type'] = 'TrafficFlowObserved'
#renaming columns
df_vinotion_SMD = df_vinotion_SMD.rename(columns={'cameraId': 'source', 'classification':'vehicleType', 'speed':'averageVehicleSpeed','uuid':'name'})
# Add dateCreated and dateModified columns
df_vinotion_SMD['dateCreated'] = pd.Timestamp.now()
df_vinotion_SMD['dateModified'] = pd.Timestamp.now()
# Calculate intensity: Total number of vehicles detected during this observation period
df_vinotion_SMD['intensity'] = df_vinotion_SMD['vehicleType'].apply(len)
# Add dataProvider column
df_vinotion_SMD['dataProvider'] = 'Vinotion'
# Calculate averageHeadwayTime: Average headway time
# The average headway time represents the average time in seconds between each vehicle observed,
# indicating how close or far each vehicle is from one another.
df_vinotion_SMD['averageHeadwayTime'] = df_vinotion_SMD.groupby('source')['dateObservedFrom'].diff().dt.total_seconds().shift(-1) / (df_vinotion_SMD['intensity'] -1)
# Replace infinite values and negative values with NaN.
df_vinotion_SMD['averageHeadwayTime'].replace([np.inf, -np.inf], np.nan, inplace=True)
df_vinotion_SMD.loc[df_vinotion_SMD['averageHeadwayTime'] < 0, 'averageHeadwayTime'] = np.nan
# Calculate average speed of the last hour
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window=60).mean().reset_index(level=0, drop=True)
df_vinotion_SMD.set_index('dateObservedFrom', inplace=True)
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window='3600s').mean().reset_index(level=0, drop=True)
df_vinotion_SMD.reset_index(inplace=True)
# Calculate speed decrease compared to the average speed of the last hour
df_vinotion_SMD['speed_decrease'] = (df_vinotion_SMD['averageVehicleSpeed'] - df_vinotion_SMD['avg_speed_last_hour']) / df_vinotion_SMD['avg_speed_last_hour']
# Identify congested periods based on the defined threshold
congested_threshold = 0.2 # 20% speed decrease
congested_duration_threshold = 5 # 5 minutes
df_vinotion_SMD['congested'] = (df_vinotion_SMD['speed_decrease'] <= -congested_threshold).rolling(window=congested_duration_threshold).sum() >= congested_duration_threshold
# Filter the DataFrame to display only rows where 'congested' is True
congested_true_df = df_vinotion_SMD[df_vinotion_SMD['congested'] == True]
# Drop the columns 'avg_speed_last_hour' and 'speed_decrease' since they are not needed in the SMD model
df_vinotion_SMD = df_vinotion_SMD.drop(columns=['avg_speed_last_hour', 'speed_decrease'])
# Convert the 'vehicleType' column to string
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].astype(str)
# Convert the 'name' column to string
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].astype(str)
# Replace empty lists of 'vehicleType' with NaN
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].replace('[]', np.nan)
# Replace empty string of 'vehicleType' with NaN
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].replace('[]', np.nan)
#TrafficFlowObserved SMD specified
df_vinotion_SMD['type'] = 'TrafficFlowObserved'
#re-ordering columns
df_vinotion_SMD = df_vinotion_SMD[['id','dataProvider','type','name','source','dateObservedFrom','dateObservedTo','vehicleType',
'averageVehicleSpeed','intensity','averageHeadwayTime','congested','dateCreated','dateModified']]
# Save the modified DataFrame to a new CSV file (optional)
df_vinotion_SMD.to_csv('/Users/maxwellernst/Documents/data/modified_vinotion_SMD.csv', index=False)
# Export DataFrame to JSON file
df_vinotion_SMD.to_json('/Users/maxwellernst/Documents/data/modified_vinotion_SMD.json', orient='records')
# Display the DataFrame
df_vinotion_SMD
| id | dataProvider | type | name | source | dateObservedFrom | dateObservedTo | vehicleType | averageVehicleSpeed | intensity | averageHeadwayTime | congested | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Vinotion | TrafficFlowObserved | ['030e5136-bcb1-4162-b686-41cc63667155', '2880... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:51:00 | 2021-11-02 09:52:00 | ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... | 45.112545 | 8 | 8.571429 | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 1 | 2 | Vinotion | TrafficFlowObserved | ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:52:00 | 2021-11-02 09:53:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.883007 | 24 | 2.608696 | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 2 | 3 | Vinotion | TrafficFlowObserved | ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:53:00 | 2021-11-02 09:54:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 39.584312 | 28 | 2.222222 | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 3 | 4 | Vinotion | TrafficFlowObserved | ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:54:00 | 2021-11-02 09:55:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 38.379448 | 31 | 2.000000 | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 4 | 5 | Vinotion | TrafficFlowObserved | ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... | wss://10.252.229.11/api/v2/data/main/object | 2021-11-02 09:55:00 | 2021-11-02 09:56:00 | ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... | 40.971777 | 23 | 2.727273 | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205540 | 205541 | Vinotion | TrafficFlowObserved | ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:55:00 | 2021-11-30 22:56:00 | ['Car', 'Car'] | 56.446200 | 2 | 60.000000 | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 205541 | 205542 | Vinotion | TrafficFlowObserved | ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:56:00 | 2021-11-30 22:57:00 | ['Car', 'Car', 'Car'] | 58.041480 | 3 | 30.000000 | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 205542 | 205543 | Vinotion | TrafficFlowObserved | NaN | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:57:00 | 2021-11-30 22:58:00 | NaN | NaN | 0 | NaN | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 205543 | 205544 | Vinotion | TrafficFlowObserved | ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:58:00 | 2021-11-30 22:59:00 | ['Car'] | 54.285480 | 1 | NaN | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
| 205544 | 205545 | Vinotion | TrafficFlowObserved | ['522263fa-6df0-4280-afa6-365ffc0a552b'] | wss://10.252.229.15/api/v2/data/main/object | 2021-11-30 22:59:00 | NaT | ['Car'] | 54.156960 | 1 | NaN | False | 2024-07-11 12:46:33.196966 | 2024-07-11 12:46:33.197610 |
205545 rows × 14 columns
import pandas as pd
import numpy as np
# Load the CSV file
df_sorama_full = pd.read_csv('/Users/maxwellernst/Documents/data/modified_data.csv', sep=',')
# Select specific columns and create new DataFrame
df_sorama_SMD = df_sorama_full[['deviceId', 'Content_time', 'Content_value']].copy()
# Create necessary columns with specified values
df_sorama_SMD['dateObservedFrom'] = df_sorama_SMD['Content_time'].copy()
df_sorama_SMD['dateObservedTo'] = df_sorama_SMD['Content_time'].copy()
df_sorama_SMD['type'] = 'NoiseLevelObserved'
df_sorama_SMD['dateCreated'] = pd.Timestamp.now()
df_sorama_SMD['dateModified'] = pd.Timestamp.now()
df_sorama_SMD['dataProvider'] = 'Vinotion'
df_sorama_SMD['id'] = range(1, len(df_sorama_SMD) + 1)
coordinates = "5.481549426062068, 51.452869111964304"
df_sorama_SMD['location'] = [coordinates] * len(df_sorama_SMD)
# Rename columns to match the desired format
df_sorama_SMD.rename(columns={'deviceId': 'refDevice', 'Content_value': 'LAeq'}, inplace=True)
# Convert dateObservedFrom and dateObservedTo to datetime
df_sorama_SMD['dateObservedFrom'] = pd.to_datetime(df_sorama_SMD['dateObservedFrom'], errors='coerce')
df_sorama_SMD['dateObservedTo'] = pd.to_datetime(df_sorama_SMD['dateObservedTo'], errors='coerce')
# Convert LAeq to numeric (float)
df_sorama_SMD['LAeq'] = pd.to_numeric(df_sorama_SMD['LAeq'], errors='coerce')
# Convert dataProvider to string
df_sorama_SMD['dataProvider'] = df_sorama_SMD['dataProvider'].astype(str)
#re-ordering columns
df_sorama_SMD = df_sorama_SMD[['id','dataProvider','type','refDevice', 'dateObservedFrom', 'dateObservedTo', 'LAeq','location','dateCreated','dateModified']]
# Save the modified DataFrame to a new CSV file (optional)
df_sorama_SMD.to_csv('/Users/maxwellernst/Documents/data/modified_sorama_SMD.csv', index=False)
# Export DataFrame to JSON file
df_vinotion_SMD.to_json('/Users/maxwellernst/Documents/data/modified_sorama_SMD.json', orient='records')
# Display the DataFrame
df_sorama_SMD
| id | dataProvider | type | refDevice | dateObservedFrom | dateObservedTo | LAeq | location | dateCreated | dateModified | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:22.310 | 2021-11-02 09:49:22.310 | 57.077148 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 1 | 2 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:23.310 | 2021-11-02 09:49:23.310 | 57.639920 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 2 | 3 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:24.310 | 2021-11-02 09:49:24.310 | 58.501630 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 3 | 4 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:25.310 | 2021-11-02 09:49:25.310 | 57.684305 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 4 | 5 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:26.310 | 2021-11-02 09:49:26.310 | 57.416267 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048570 | 1048571 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:06.610 | 2021-11-02 09:49:06.610 | 58.866556 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 1048571 | 1048572 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:07.610 | 2021-11-02 09:49:07.610 | 57.896402 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 1048572 | 1048573 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:08.610 | 2021-11-02 09:49:08.610 | 58.556143 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 1048573 | 1048574 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:09.610 | 2021-11-02 09:49:09.610 | 58.915390 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
| 1048574 | 1048575 | Vinotion | NoiseLevelObserved | 2,03804E+18 | 2021-11-02 09:49:10.610 | 2021-11-02 09:49:10.610 | 59.383412 | 5.481549426062068, 51.452869111964304 | 2024-07-11 12:46:46.267180 | 2024-07-11 12:46:46.268417 |
1048575 rows × 10 columns
From the scripts the a JSON schema is created, and an input example is used of the first row to test if the the input data will match the JSON schema
import json
from jsonschema import validate, ValidationError
# Define the JSON schema
schema1 = {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Vinotion SMD Data",
"type": "object",
"properties": {
"id": { "type": "integer" },
"dataProvider": { "type": "string" },
"type": { "type": "string" },
"name": {
"type": ["array", "null"],
"items": { "type": "string" }
},
"source": { "type": "string" },
"dateObservedFrom": { "type": "string", "format": "date-time" },
"dateObservedTo": { "type": "string", "format": "date-time" },
"vehicleType": {
"type": ["array", "null"],
"items": { "type": "string" }
},
"averageVehicleSpeed": {
"type": ["number", "null"]
},
"intensity": { "type": "integer" },
"averageHeadwayTime": {
"type": ["number", "null"]
},
"congested": { "type": "boolean" },
"dateCreated": { "type": "string", "format": "date-time" },
"dateModified": { "type": "string", "format": "date-time" }
},
"required": [
"id",
"type",
"dateObservedFrom",
"dateObservedTo",
]
}
# Define the JSON schema
schema2 = {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Sorama SMD Data",
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"dataProvider": {
"type": "string"
},
"type": {
"type": "string"
},
"refDevice": {
"type": "string"
},
"dateObservedFrom": {
"type": "string",
"format": "date-time"
},
"dateObservedTo": {
"type": "string",
"format": "date-time"
},
"LAeq": {
"type": "number"
},
"dateCreated": {
"type": "string",
"format": "date-time"
},
"dateModified": {
"type": "string",
"format": "date-time"
},
"location": {
"type": "string",
"format": "point"
}
},
"required": ["id",
"type",
"dateObservedFrom",
"dateObservedTo",
"location"]
}
# Sample input to validate
input_data = {
"id": 1,
"dataProvider": "Vinotion",
"type": "TrafficFlowObserved",
"name": ["030e5136-bcb1-4162-b686-41cc63667155", "2880..."],
"source": "wss://10.252.229.11/api/v2/data/main/object",
"dateObservedFrom": "2021-11-02T09:51:00",
"dateObservedTo": "2021-11-02T09:52:00",
"vehicleType": ["Car", "Car", "Car", "Truck", "Car", "Car", "..."],
"averageVehicleSpeed": 45.112545,
"intensity": 8,
"averageHeadwayTime": 8.571429,
"congested": False,
"dateCreated": "2024-07-02T11:36:32.281404",
"dateModified": "2024-07-02T11:36:32.282731"
}
# Validate the input against the schema
try:
validate(instance=input_data, schema=schema1)
print("Input is valid.")
except ValidationError as err:
print("Input is invalid:", err.message)
Input is valid.
The Vinotion input matches the schema.
The input is used from the first row of the dataset, to test if the input can match the schema.
# Define the input JSON
input_json ={
"id": 1,
"dataProvider": "Vinotion",
"type": "NoiseLevelObserved",
"refDevice": "2,03804E+18",
"dateObservedFrom": "2021-11-02T09:49:22.310Z",
"dateObservedTo": "2021-11-02T09:49:22.310Z",
"LAeq": 57.077148,
"dateCreated": "2024-05-15T10:49:29.228556Z",
"dateModified": "2024-05-15T10:49:29.228556Z",
"location": "5.481549426062068, 51.452869111964304"
}
# Validate the input JSON against the schema
try:
validate(instance=input_json, schema=schema2)
print("Input JSON is valid.")
except ValidationError as e:
print(f"Input JSON is invalid: {e.message}")
Input JSON is valid.
The Sorama JSON input data matches the schema as shown in the result above.
This notebook concludes the MIM2 data models and sharing for Eindhoven's Inzicht Verlicht Data Model. Although there are improvements to be made but because of time constraints they were noted down for future iterations below in the Recommendations. This ETL process shows a tangible solution of MIM2 and this proved to be useful in creating the Tutorial on how to create MIM2 models. There were many steps involved in converting the data, which seemed to be simple but took a good understanding of the data and communication with the stakeholders in order to create a reliable result.
Comments were added to ensure that the decisions and assumptions made were noted for future iterations. Ideally the scripts made would be used with a live connection to the devices and sensors, and the data would be uploaded to a Data Space through an API.
Must-haves :
Locations of the Sorama sound sensors, the locations are currently aquired manually through google maps and the image of the intersection, this could be done better with the actual locations when thinking of a live connection. Also there is currently only two sensors available, and the ID row values are rounded off based on the orginal CSV values, which could be an issue later on.
refRoadSegment from Smart Data Model : TrafficFlowObserved should be added, which is the RuleID + RealDirection in the form of a URL type.
Nice-to-haves :
LAmax and LAeq_d columns can be added to the Sorama CSV if the data is aggregated is larger intervals such as 1 minute instead of 1 second since it is already aggregated to 1 second intervals, the LAqMax would be the same as the LAq values.
Look into the NoisePollution Smart Data Model for Sorama data conversion.
Can still convert the Air Quality data to the Smart Data Model of AirQualityObserved.
Look into NDW public road data for Eindhoven to maybe enrich the data more. Eindhoven prefer the NDW public data over the BGT national open data.
Look into data from OpenStreetMap.
Example of refDevice or Device ID's that are in the current data, which are rounded off, and are only two.
unique_values = df_sorama_SMD['refDevice'].unique()
print(unique_values)
['2,03804E+18' '1,46158E+18']